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Feller’s Contributions to Mathematical Biology 

Ellen Baake* and Anton Wakolbinger^ 


Abstract 

This is a review of William Feller’s important contributions to mathematical biology. 
The seminal paper BFeller 195111 Dijfusion processes in genetics was particularly influen¬ 
tial on the development of stochastic processes at the interface to evolutionary biology, 
and interesting ideas in this direction (including a first characterization of what is nowa¬ 
days known as “Feller’s branching diffusion”) already shaped up in the paper HFeller 193911 
(written in German) The foundations of a probabistic treatment of Volterra’s theory of the 
struggle for life. Feller’s article On fitness and the cost of natural selection HFeller 196711 
contains a critical analysis of the concept of genetic load. 

The present article will appear in: Schilling, R.L., Vondracek, Z., Woyczynski, W.A.: 
The Selected Papers of William Feller. Springer Verlag. 


1 Introduction 

Feller had a persistent interest in biology. This is doeumented in numerous examples from 
mathematieal geneties in his monograph HFeller 19501 IFeller 1966L and by a eouple of influ¬ 
ential researeh papers at the interface of population biology and probability theory. Looking 
back at these papers in historical perspective is highly rewarding: They are cornerstones of 
biomathematics; they mirror the development of probability theory of their time; and at least 
one of them ( HFeller 195 111 ) had lasting impact on probability theory. 

Feller’s important papers on the interface to biology are HFeller 1939L nFeller 195 IL and 
HFeller 196711 . The first one addresses general population dynamics, the other two are mainly 
concerned with models in population genetics. The area of population dynamics is concerned 
with the growth, stabilisation, decay, or extinction of populations. Models of population dy¬ 
namics describe how the size of populations changes over time under given assumptions on 
birth and death rates of individuals, which may depend on the current population size since in¬ 
dividuals interact (e. g. compete) with each other. In contrast, population genetics is concerned 
with the genetic composition of populations under the action of various evolutionary processes, 
such as mutation and selection. Naturally, there is no sharp boundary between the fields, as we 
will also see in Feller’s contributions. Let us now look at them. 


2 Feller and population dynamics 

In HFeller I939L a paper still in German entitled (in English translation) The foundations of a 
probabilistic treatment ofVolterra ’s theory of the struggle for life. Feller presents a synthesis of 
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two fundamental developments that both started in 1931. On the one hand, Volterra presented 
his book Lessons about the mathematical theory of the struggle for life ETll : on the other hand, 
Kolmogorov published his seminal paper On analytical methods in probability theory IfTTlI . 
Volterra’s book laid the foundations for the deterministic description of population dynamics 
in terms of systems of ordinary differential equations that model birth, death, and interaction 
of individuals. These models imply that populations are so large that random fluctuations 
can be neglected, and population sizes are measured in units so large that the size can be 
considered a continuous quantity. Kolmogorov presented the general and systematic formalism 
for the description of stochastic dynamics in terms of Markov chains in continuous time; in 
particular, he found the description for the evolution of probability weights and the transport 
of expectations in terms of differential equations, which we know today under the names of 
Kolmogorov forward equations and Kolmogorov backward equations. 

In his 1939 paper. Feller ties these two fundamental developments together by applying 
Kolmogorov’s new formalism to some examples of Volterra’s population dynamics. We see 
here the birth of the stochastic description of population dynamics, which today has its firm 
place in mathematical biology, and is highly developed both in analytical terms and in terms of 
simulations. 

Feller’s paper is devoted to the description of single populations (except from a small excur¬ 
sion to predator-prey models in the end) and consists of two large parts. The first establishes the 
Kolmogorov forward equations (KFE) for the Markov jump processes (namely, birth-and-death 
processes) that describe finite populations (remarkably, there is no mention of the Kolmogorov 
backward equations in this paper). The second part discusses a continuum analogue of such 
processes, a special case of which seems to be the first appearance of what today is called 
Feller’s branching diffusion. 

It is remarkable to see (and a pleasure to read) that Feller notices some of the crucial rela¬ 
tionships between corresponding deterministic and stochastic models in this early paper, which 
appear as a central theme. 

For the sake of clarity, let us make explicit here the two fundamental limits of birth-death 
processes that are addressed in HFeller 1939L Consider a birth-death process K^f) with birth 
rate nX and death rate np when in state n, with N being the initial population size. Then, as 
the initial population size N tends to oo, the sequence of process {K]^{t)/N)t>o, N = 1,2,..., 
converges in distribution to the solution of the differential equation 

(1) i=(A—/i)x, x(0) = 1. 

This reflects a dynamical version of the Law of Large Numbers (see e. g. [HH). (Notably, due 
to the linearity, the expectation M(t) := E(^i (t)) satisfies O as well.) A different kind of limit 
emerges if one assumes that the individual split and death rates A and p depend on N and the 
process is nearly critical in the sense that 

Xm = /3 -|- 01 /N and /iiv = /3 -I- 02/A^, 

with 01 — 02 =: a. The Law of Large Numbers then says that the limit of the processes 
{K^f) / N)t>Q is the constant 1. However, on a larger time scale the fluctuations become visi¬ 
ble: the sequence of processes {Ki^{Nt) /N)t>Q converges in distribution to the solution of the 
stochastic differential equation (l5.kl) stated in paragraph B.l.li whose diffusion equation is ©. 
This is a prototype of a diffusion limit for birth-death processes. In HFeller 19391 . these limiting 
procedures are not made explicit (but see HFeller 195111 for a major step in this direction). Feller 
in 1939 goes rather the other way, in search for stochastic processes that correspond to a given 
deterministic model. Let us now explain the major lines of his article. 
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2.1 Markov jump processes for population dynamics 

In the first part (Sections 1-4), devoted to the stochastic description of finite populations, Feller 
explains a variety of birth-and-death processes and sets up the Kolmogorov forward equations 
for them, i. e. he establishes the system of differential equations that describe how the proba¬ 
bility weights for the number of individuals alive at time t evolve over time. He starts with the 
simple linear death process (where each individual dies at rate A, independently of all others), 
proceeds via the corresponding birth process and the linear birth-and-death process and finally 
arrives at the general birth-and-death process. In an individual-based picture, the latter includes 
interaction between individuals, so that the birth and/or death rates are no longer linear in the 
number of individuals. The case of logistic growth, which includes a quadratic competition 
term, serves as an important example; the case of ‘positive interaction’ (such as symbiosis) is 
not treated explicitly here. Let us comment on the major insights of this part. 

2.1.1 Kolmogorov equations, their solutions, and relationship with deterministic de¬ 
scription. 

Feller notices that for a given net reproduction rate a per individual, by choosing A — /i = a, 
one obtains a variety of linear birth-death processes whose expectation value M{t) satisfies one 
and the same ODE O, whereas for a > 0, there is exactly one linear pure birth process (A = 
a,/i =0) with this property. Feller states this ambiguity explicitly when discussing logistic 
growth. Its deterministic version is given by the differential equation 

(2) m = m(A — ym) =:/(m), 

which Feller also calls the Pearl-Verhulst equation. Here m is shorthand for m{t), the ‘deter¬ 
ministic version’ of the population size at time t, A denotes the per capita net reproduction rate 
in the absence of competition, and y is the competition parameter. Again, Feller notices that 
there are many possibilities in terms of birth-death processes that correspond to Q. They are 
parametrised in his Eq. (27), which describes the process with per capita birth at rate CO — vn 
and per capita death at rate t — on if there are currently n individuals. Here, we have renamed 
yin Feller’s Eq. (27) by v in order to achieve compatibility with the notation in Q. 

Eeller starts out by calculating the explicit solution to the KEE of the pure linear death pro¬ 
cess, that is, the number of inidividuals alive at time t; he states this as the result of a recursive 
construction. With a typo corrected — 1 must be replaced by 1 — in his formula (6)), 
this same formula says that the number of inidividuals alive at time t has a binomial distribution 
with parameters N and if there are initially N individuals. (Today, after HEeller 1950L we 
would conclude this immediately, without solving systems of Kolmogorov forward equations, 
via the probabilistic argument that there are initially N independent individuals, each of which 
dies at rate A and is therefore alive at time t with probability .) Eikewise, the solution of 
the pure linear birth process with per capita birth rate A, which he gives in his Eq. (17), is the 
negative binomial distribution with parameters N and which arises as the distribution of 
the sum of N independent random variables that are geometrically distributed with parameter 
. Again, this has a nice interpretation as the offspring of N independently reproducing 
ancestors. 

Eor the general birth process, with arbitrary birth rates p„, Eeller notes that the KEE define 
a probability distribution if and only if either only finitely many of the are positive, or if 

l/p„ diverges; this is a standard textbook result today (usually presented in the general¬ 
isation to birth-and-death processes). Under the conditions stated, he also gives the explicit 
solution in passing. 


3 




2.1.2 The moments of the stochastic process, and their relationship with the determinis¬ 
tic equation 

Feller is partieularly interested in the expeetation, varianee, and other moments of the (random) 
number of individuals alive at time t. In a trendsetting way, he does not ealeulate them from 
the explieit solution of the KFE, even where this is known; he rather uses the KFE to derive 
differential equations for the moments. Let M{t) = Y^k^Pk{t) = be the expeeted number 

of individuals at time t. As stated above, Feller observes that, for the linear birth-and-death 
process, Mft) follows the differential equation for the deterministic population model, and 
hence the expectation of the stochastic process coincides with the deterministic solution. In 
contrast, for the logistic model, he finds from the differential equation relating the first and the 
second moment that 

(3) M < f{M) 

with / of Eq. Q. Erom this he argues that M is always less than the solution of the logistic 
equation. An alternative way to see Q would be to observe that the KFE gives 

fElK(l)] = E{g(K(l)] 

where g{k) = Q{k, n)n and 

Q{n,n + l)=Xn, Q{n,n — l) = Yn^, Q{n,n) ——{Xn + 2(k,n) = 0 otherwise. 

As a matter of fact, it turns out that g{k) = Xk — yk^, which is strictly concave, and hence ([3]) 
is a consequence of Jensen’s inequality. Since Feller does not consider models with positive 
interaction (such as symbiosis) in this part of the paper, he does not encounter the convex 
situation. 

2.2 Diffusion equations for population dynamics 

The second part of the paper (Sections 5-8 and 10) is devoted to the diffusion limit of stochastic 
population dynamics. We cannot resist to quote Feller’s thoughts from the beginning of Section 
5, formulated in an almost literary German, about the substantially more lithesome probabilistic 
treatment, in which the population size is no longer assumed as integer-valued, and where he 
alludes to similarities to the Brownian motion: 

Wir wenden uns nun der anderen von der in der Einleitung erwdhnten wahr- 
scheinlichkeitstheoretischen Behandlungsweisen des Wachstumsproblems zu, welche 
wesentlich geschmeidiger ist, und bei der die Grosse der Population nicht mehr 
ganzzahlig vorausgesetzt wird. Den Mechanismus des Vorgangs kann man sich 
hier dhnlich wie bei der Brownschen Molekularbewegung vorstellen. Der Zustand 
der betrachteten Population, d. h. ihre gesamte Lebensenergie ist einer dauernden 
Veranderung unterworfen [... ] 

Starting from the transition density. Feller calculates the infinitesimal drift a{x) and the 
infinitesimal variance b{x) (provided they exist). With remarkable intuition, and a clear view 
of the branching property, he states that, in the case of a stochastically independent reproduction 
of the individuals, a{x) and b{x) must be proportional to x. Again, let us quote in German: 
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Nimmt man beispielsweise an, dass die Grasse der Population keinen Einfluss hat 
auf die durchschnittliche Vermehrungsgeschwindigkeit der Einzelindividuen, d.h. 
dass diese untereinander stochastisch unabhdngig sind [...], so miissen a{x) und 
b{x) offenbar proportional zu x sein [... ] 


This gives rise to his equation (38), whieh reads as 


(4) 


dw{t,x) ^d'^{xw{t ,x)) ^d{xw{t,x)) 

- /) CXi ~ 


dt 


dx^ 


dx 


Here w{t, •) is the density of population size at time t, and a and /3 are positive eonstants. This 
seems to be the first appearanee of what beeame famous as Eeller’s branching diffusion. We 
will eome baek to this in Seetion lTTl 

It is interesting to note that the diffusion proeess in this seeond part of the paper is not 
derived from the birth-death jump proeesses whieh Feller has presented in the first part; maybe, 
at this early stage, the subtle resealing required for this limit was not yet at his fingertips. 
A deeade later, however, he had these teehniques; see Seetion 13711 on Diffusion processes in 
genetics. In 1939, Feller does allude to the birth-death processes, but the connection is not yet 
clear. For example, he tells us that the a in dH) corresponds to the A encountered in the pure 
birth process. This is correct for the expected growth rate, but as a matter of fact a pure birth 
process cannot have the diffusion limit dH), since the paths of the former can only increase in 
time, whereas the paths of the latter have fluctuations in both directions. 

This issue reappears when Feller discusses the extinction probability of the diffusion pro¬ 
cess. He notes the important fact that this quantity increases with /3, since it is tied to the 
fluctuations of the process, and at the same time emphasises as a sort of paradox that, even for 
a positive net growth a > 0, the diffusion process may die out with positive probability, while 
the population described by the deterministic differential equation dB> as well as a pure birth 
process, cannot die out. (This paradox is resolved when one has in mind the different rescalings 
that lead to dD and dll).) 

Following these considerations of the linear birth-and-death process. Feller includes depen¬ 
dence between individuals in Sections 8 and 10. He presents two specific examples. The first 
is his Equation (51), which is the diffusion version of his Equation (7) and known today as 
Eeller’s branching diffusion with logistic growth [|20ll23]l . The second is the (two-dimensional) 
diffusion describing a two-species model with predator-prey interactions, which now is also 
called Lotka-Volterra process, see Eq. (1.2) in Q. As with the jump processes in the first part 
of his paper, Eeller is concerned with the moments of the diffusion processes and writes down a 
general recursion for the kth moments M^. In the two-species model, the interaction is positive 
(from the point of view of the predator), so we finally encounter the convex case, in which the 
expectation is greater than the solution of the corresponding ODE. 

In Section 9, Eeller makes some final remarks concerning the deterministic limit of both the 
birth-death jump processes and the branching diffusion. These are brief, heuristic calculations, 
which hint at the convergence of the stochastic models to Volterra’s population models in the 
limit of infinite population size. Today, powerful laws of large numbers are available for large 
classes of such processes |[8l Chap. 11]. They go far beyond the simple linear case alluded to in 
([T]); rather, they include quite general forms of density dependence. This leads us to the present 
state of population dynamics. 
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2.3 Afterthoughts 


Today, 75 years after HFeller 1939L stoehastie population dynamies eonstitute a vibrant area 
of researeh, so wide that it is impossible to give an overview in a short paragraph. Suffiee 
it to say that major questions raised by Feller eontinue to be ardent researeh themes. Above 
all, this is true of interaetions within and between populations. Even simple models for the 
eompetition of two populations, whose deterministie limit ean be taekled as an easy exereise, 
turn into hard problems when eonsidered probabilistieally. Speeifieally, diffusion models with 
interaetion have beeome objeets of intense researeh, see, e.g., |l6l ?] and referenees therein. 

In the eontext of this eommentary, it is partieularly noteworthy that a elass of models known 
under the name of adaptive dynamics brings together eeologieal aspeets (on a short time seale) 
and genetieal aspeets (on a longer time seale) and thus builds a bridge between population 
dynamies and population geneties. A niee overview of this topie and many others may be found 
in the monograph by Haeeou, Jagers, and Vatutin IfT^ . Let us now turn to Feller’s eontributions 
to population geneties. 


3 Feller and population genetics 

As already laid out in Seetion [H important proeesses in population geneties are those that 
deseribe the evolution of type frequeneies, or in other words, of proportions of subpopulation 
sizes within a total population, whose size may vary as well. In this eontext, we may think of 
the individuals as genes, where eaeh gene is of a eertain type, say a or A. 

The foundations of mathematieal population geneties were laid starting in the 1920s by 
Fisher, Wright, and Haldane. Their work mirrors the geneties of their time, today known as 
classical genetics. It had to rely on the phenotypie appearanee of individuals (eolour of flower, 
surfaee strueture of peas, body weight, milk yield ...). The moleeular basis of geneties was 
still unknown, so genes had to be treated as abstraet entities. When molecular genetics entered 
the labs in the 1960s, population geneties ehanged dramatieally, with Kimura as a leading 
figure, see Seetion U. 2. 1[ The next (and, from a 2014 perspeetive, the last) big leap took plaee 
in 1982, when Kingman introdueed the genealogieal perspeetive via the coalescent process. 
Comprehensive overviews of population geneties theory are given in the textbooks by Ewens 
lf9l and Durrett |[5l ; for eoaleseent theory in partieular, we further reeommend Berestyeki |l2l 
(from a mathematieal point of view) and Wakeley [[28ll (from a more biologieal perspeetive). 

With his eontributions to population geneties, Eeller thus was in the midst of an important 
line of development. We will eomment on two of these artieles. The first. Diffusion processes 
in genetics HEeller 195 IL is a landmark eontribution towards stoehastie modelling and analysis 
via diffusion proeesses, and, as a matter of faet, reaehes far beyond population geneties as 
sueh. The seeond. On fitness and the cost of natural selection flEeller 1967L uses deterministie 
modelling (and is therefore similar in spirit to the ‘Volterra equations’). 

3.1 Diffusion processes in genetics 

Teller’s artiele Diffusion processes in genetics HEeller 195111 appeared in the Proeeedings of 
the 2nd Berkeley Symposium on Mathematieal Statisties and Probability, whieh took plaee in 
1950. The eentral role of HEeller 195111 is nieely put into perspeetive by the following quote 
from Thomas Nagylaki’s review Il24]l on Gustave Maleeot and the transition from elassieal to 
modem population geneties: 


6 














Mathematical research in diffusion theory influenced population genetics only grad¬ 
ually. As described in more detail below, Wright was unaware of Kolmogorov’s 
(1931) pioneering paper, and Wright, Malecot, and Kimura were all apparently 
unacquainted with Khintchine’s (1933) book.[...] Thus, the mutually beneficial 
cross-fertilization between diffusion theory and population genetics did not start 
until Feller published his seminal 1951 paper. 

In the introduction of that paper, Feller sets the stage by writing: 

Relatively small populations require discrete models, but for large populations it 
is possible to apply a continuous approximation, and this leads to processes of the 
diffusion type. 

Two diffusion processes are in the focus of the paper. One is what is nowadays called 
Feller’s branching diffusion, the other is the so-called Wright-Fisher diffusion. Feller describes 
them by their diffusion equations (5.1) and (7.1), which are the Kolmogorov forward equations 
(or Fokker-Planck equations) for the densities, here called u(t,x), cf. Section |2^ Feller writes 
on pp.228-229: 

It is known that an essential part of Wright’s theory is mathematically equivalent 
to assuming a certain diffusion equation for the gene frequency (that is, the pro¬ 
portion of a-genes). 

In a footnote on the same page. Feller gives hints to the roots of this knowledge in the work of 
Kolmogorov, Fisher, Wright, and Malecot. 

3.1.1 A foresight: Feller’s diffusions as solutions of stochastic differential equations 

Nowadays we do not hesitate to write the process described by Feller’s equations (5.1) and 
(7.1) as solutions of stochastic differential equations in the sense of ltd: 


dZt = s/ipZtdWt -9 aZt dt, 


(5.1') 

(7.1') 


dYt = s/2^Yfll - Yt) dWt + (72( 1 - Yt) - JiYt) dt, 


where VF is a standard Brownian motion. Feller legitimately resisted writing the processes 
in this form. In HFeller 195211 . which grew out of Feller’s invited lecture at the International 
Congress of Mathematicians in the year 1950, he writes about Ito’s Stochastic Analysis: 

This approach has the advantage that it permits a direct study of the properties 
of the path functions, such as their continuity, etc. In principle, we have here a 
possibility of proving the existence theorems for the partial differential equations 
[... ] directly from the properties of the path functions. However, the method is for 
the time being restricted to the infinite interval and the conditions on [the diffu¬ 
sion and drift coefficients] a and b are such as to guarantee the uniqueness of the 
solution. So far, therefore, we cannot obtain any new information concerning the 
“pathological” cases. 
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3.1.2 An emerging theme: What happens at the boundary? 


Indeed, the state spaees of (5.1) and (7.1) are not the ‘infinite interval’ (—oo^oo) but [0,°°) and 
[0,1], and it took 20 years until T. Yamada and S. Watanabe proved that the eoeffieients in the 
above stated SDEs are good enough to guarantee strong uniqueness of the solution, see ifSTll 
and also OOll . A eoupling argument from Stoehastie Analysis then guarantees that the solution 
of dm) eonverges in law to the unique equilibrium distribution whose density is the unique 
invariant probability density of (7.1), whieh is the Beta(72//3,7i//3)-density. Thus, although 
for 7 i < /3 the random path Y hits 0 with probability one (and similarly for 72 < /3 it hits 1 
with probability one), these visits to the boundary do not lead (as eonjeetured by Feller on 
p. 239) to a non-vanishing accumulation of the masses concentrated at x = 0 and x = 1 which 
is maintained in the steady state, in other words, the eoeffieient /i in his equation (7.3) is in faet 
equal to 1 . 

Questions like these may have been one souree of motivation for Feller to initiate his 
groundbreaking studies on the boundary elassifieation of diffusion proeesses, see his footnote 
on p. 234, where he speaks of boundary conditions of an altogether new type, and the one on 
p. 229 added in proof, where he announees that a systematic theory, including the new boundary 
condition, is to appear in the Annals of Math. Feller’s classification of boundaries is reviewed 
and eommented in Seetion 2 of Masatoshi Fukushima’s essay in this volume. 

3.1.3 The diffusion approximation of the Wright-Fisher chain and heyond 

As already indieated, another important aspeet that is taken up in Feller’s paper is that of the 
diffusion approximation, i. e. the eonvergenee of a sequenee of (properly sealed) diserete pro¬ 
eesses to the solution of (5.1) and (7.1), respeetively. In the former ease the underlying diserete 
proeess is a Galton-Watson proeess, in the latter it is the Wright-Fisher Markov ehain. The 
transition probabilities of the Wright-Fisher ehain are given by (3.2), (3.4) and (3.5). The diffu¬ 
sive mass-time-scaling is given by (8.5): a unit of time eonsists of N generations, and a unit of 
mass eonsists of N (or here 2N) individuals, with N being the total population size. The sealing 
(8.4) of the individual mutation probabilites Cti, 0:2 is that of weak mutation, whieh leads in the 
sealing limit to the infinitesimal mean displacement a{x) and the infinitesimal variance 2b{x), 
see ( 8 . 6 ) and (8.7). In the eontext of (7.1), the drift coefficient a(x) is due to the effeet of 
mutation, and the diffusion coefficient b{x) deseribes the strength of the fluetuations that eome 
from the random reproduetion. (In order to be eonsistent with (7.1), fit should be replaeed by 
7 in (8.4), ( 8 . 6 ) and (8.9)). The ‘eonvergenee of generators’ whieh emerges from ( 8 . 6 ) and 
(8.7) ean be lifted to the eonvergenee of the eorresponding semigroups, see e. g. the ehapter on 
Genetic Models in the monograph [HI by Ethier and Kurtz. 

The eonvergenee theorems in (Hll eomply with Feller’s programmatie proposal: It should be 
proved that our passage to the limit actually leads from ( 8 . 2 ) to ( 8 . 6 ), i. e. from the probability 
weights of the Wright-Fisher ehain to the probability densities the Wright-Fisher diffusion. 
To aehieve this. Feller proposed an expansion into eigenfunetions (in partieular he found the 
eigenvalues of the Wright-Fisher transition semigroup) and eheeked part of the eonvergenee 
in Seetion 8 and Appendix I. Sueh a representation is not required in the systematie approaeh 
presented in [HI. Still, the approaeh via eigenfunetions is interesting in its own right, and has 
been extensively used in Mathematieal Biology. 

At the beginning of Seetion 9 (entitled Other possibilities) Feller writes : 

The described passage to the limit which leads to Wright’s diffusion equation (7.1) 

is different from the familiar similar processes in physical diffusion theory where 
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the ratio Ax/At tends to infinity rather than to a constant. It rests entirely on the 
assumption (8.4) [of weak mutation]. We shall now see that any modification of 
this assumption leads to a non-singular diffusion equation of the familiar type (to 
normal distributions). 

Indeed, for the scaling (9.1), (9.2) a,- = i = 1,2, Ne ^ which corresponds to strong 
mutation. Feller states a law of large numbers, i. e. a convergence of the type frequencies to the 
equilibrium point yj+y, argues that the (properly scaled) process of fluctuations 

around this equlibrium point converges to a process whose probability density satisfies the 
diffusion equation (9.10) (and thus is an Ornstein-Uhlenbeck process). 

3.1.4 The diffusion approximation of Galton-Watson processes 

The diffusion equation (5.1) appeared already in HFeller 1939L see Eq. dH) in Section [2^ How¬ 
ever, as we have seen there, certain issues concerning the (scaling) limits of Galton-Watson 
processes had remained unrevealed in HFeller 1939H . Towards 1950, Feller was ready to at¬ 
tack this. As to the diffusion approximation of a sequence of “nearly critical” Galton-Watson 
processes by (5.1), Feller gives a proof in Appendix II. His idea is to take the iterates /„ of 
the offspring generating function (which are known to describe the generating functions of the 
subsequent generation sizes) to their scaling limit. This limit turns out to satisfy the PDF (12.9) 
(which, in turn, corresponds to (5.1)). Feller writes: 

We effect this passage to the limit formally: it is not difficult to justify these steps, 
since the necessary regularity properties of the generating functions ff (x) were 
establihed by Harris IfTSlI . 

Again, from today’s perspective, an alternative way is provided by the convergence of gener¬ 
ators, see [[8||. In the very last lines of his Appendix II, Feller remains a bit sketchy when he 
writes that 

the boundary condition u{t,0) follows from the fact that in the branching process 
the probability mass flowing out into the origin tends to zero. 

In fact, for the solution Z of (l5.1^l) (with Zq = 1, say), the probability mass flowing out into 
the origin is non-zero at any fixed time t, and the density of Zf does not vanish near the origin. 
Again, the desire to obtain clarity on questions like these may have been a motivation for 
Feller’s then upcoming research on the boundary behaviour of one-dimensional diffusions. 

3.1.5 From two-type to continuum-type generalizations of Feller’s branching diffusion 

In the introduction. Feller points out that serious difficulties arise if one wishes to construct 
population models with interactions among the individuals, and that the situation grows worse 
if the population consists of different types of individuals. He then writes: 

In fact, the bivariate branching process leads to such difficulties that apparently not 
one single truly bivariate case has been treated in the literature. In the theory of 
evolution this difficulty is overcome by the assumption of a constant population size 
[... ] In Section 10 the assumptions of constant population size is dropped and a 
truly bivariate model is constructed which takes into account selective advantages 
in a more flexible way. [... ] The same limiting process which leads [... ] to the 
diffusion equation of Wright’s theory can be applied to our new bivariate model 
and leads to a diffusion equation in two dimensions. 
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These two-dimensional Markov processes with branching property have been taken up and 
analysed in a broader eontext in 1969 in the paper |[^ whieh earries that title. Already before, 
Watanabe had published his seminal paper [[29l whieh established Feller’s branehing diffusions 
with a eontinuum of types. This together with the poineering work of Don Dawson gave rise 
to a class of processes that were later called Dawson-Watanabe superprocesses ([|S IH). A 
good part of Perkins’ Saint Flour Lecure Notes (part 2 of dUl) is devoted to superprocesses with 
interactions, and thus is fully on the line of Feller’s program to construct population models 
with different types of individuals and with interactions among the individuals. 

3.1.6 The inner life of Feller’s branching diffusion: excursions and continuum trees 

This is a good place to mention another fascinating development which is connected with 
Feller’s branching diffusions and is associated with the names of Daniel Ray and Frank Knight 
(the latter was Feller’s doctoral student and Ed Perkins’ PhD advisor). 

Thanks to the branching property (and the thereby implied infinite divisibility), the random 
path Z of a Feller branching diffusion is a Poissonian sum of countably many “Feller excur¬ 
sions” In fact, each of them has an “internal life” in the sense that l^t is the size at time t of a 
continuum population originating from one single ancestor. The genealogical tree of this pop¬ 
ulation can be described by a Brownian excursion t] reaching level t, which can be imagined as 
the ‘exploration path’ of a continuum random tree whose mass alive at level t is The second 
Ray-Knight theorem says that can be represented as the local time spent by rj at level t. In 
a discrete setting of Galton-Watson processes, this is ancticipated in Harris’ work ifTTI with 
its section on walks and trees. The correspondence between a Feller excursion ^ and an Ito 
excursion rj is depicted on the first page of [|25]l . framed by pictures of Feller and Ito, who met 
in person at Princeton in 1954. See ll^ for more explanations, and references to groundbreak¬ 
ing developments that dealt with the genealogical structure behind Feller’s branching diffusion, 
such as Aldous’ Continuum Random Tree (which plays in the world of random trees a similar 
role to that of Brownian motion in the classical invariance principle) and Le Gall’s Random 
Snake, which provides a representation of the Dawson-Watanabe super-Brownian motion as a 
continuum-tree-indexed Markov motion. 

For more on excursions and excursion point processes in relation with Feller’s work, see 
Section 3.1 of the contribution of M. Fukushima. 

3.1.7 Frequencies in multivariate continuum branching: conditioning and time change 

Another interesting question which Feller addresses at the end of his introduction concerns the 
relative frequencies in a bivariate model of branching diffusions. Feller writes: 

[... ] it is to be observed that in no truly bivariate case does the gene frequency 
satisfy a diffusion equation {Sections 6 and 10). In fact, if the population size is 
not constant, then the gene frequency is not a random variable of a Markov process. 

Thus, conceptually at least, the assumption of a constant population size plays a 
larger role than would appear on the surface. 

Indeed, as it turns out (and Feller may have been well aware of this), one way of passing 
from (5.1) to (7.1), say with a = 71 = 72 = 0, is to consider two independent Feller branching 
diffusions (solutions of (l5.1^1) 1 and conditioned to Z^^'l ^z‘^'1 = 1. Of course, this 
must be given a precise meaning, and this has been done in a much more general context 
in the papers by Etheridge and March Q and Perkins [l26ll . The title of Perkins’ paper is 
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programmatic: A Dawson-Watanabe superprocess eonditioned to have eonstant mass one is a 
Fleming-Viot process (whieh is the eontinuum-type, and thus measure-valued, generalization 
of the Wright-Fisher diffusion). In the eontext of (5.1) to (7.1) this means that under the 
eonditioning -f = 1 the proeess Y := Z^^^ is a Wright-Fisher diffusion. On the level of 
genealogies, the eonditioning to a eonstant total mass takes the eontinuum random forest that 
underlies (15.1^1) into Kingman’s coaleseent. 

A seeond way to get from (l5.1^l) to (IV.l^l) (again for a = 71 = 72 = 0) is to eonsider the 
relative frequeney Y = after a time ehange ds = In this way, the 

relative frequeneies again beeome Markovian and, by an easy applieation of Ito’s formula, turn 
out to solve (IV.kl) . 

The eoneept of time ehange is eentral also in the work of John Lamperti. Lamperti’s work 
ean be seen as a direet eontinuation and extension of Feller’s ideas, introdueing and analysing 
the eontinuum mass limits of Galton-Watson proeesses also for heavy-tailed offspring distri¬ 
butions ETl . His artiele which was communicated by H. P. MeKean, another former PhD 
student of Feller, introdueed what is now ealled Lamperti’s transform, a time change whieh 
establishes the link between Levy proeesses and continuous state branehing proeesses. 

To eonelude: Feller’s paper Dijfusion processes in genetics is a remarkable contribution at 
the interface of probability theory and population biology, with enduring stimulations in either 
direetion. It takes a central plaee in the development of mathematieal population geneties, and 
has triggered substantial new directions in the modem theory of stochastie proeesses. 

3.2 The cost of natural selection 

Feller’s article On fitness and the cost of natural selection HFeller 196711 appeared in Genetical 
Research Cambridge, a renowned biology journal. The introduction contains the disclaimer 

This paper is written by a mathematician, and accordingly no new biological mod¬ 
els or hypotheses are advanced. 

It may be added that the mathematies is fairly elementary as well, and the strength of the 
article is the concise conceptual thinking by which Feller disseets the logics of an argument 
that enormously influeneed the genetic thinking of that time, and finds a fundamental weak 
point in it. 

3.2.1 The 1960s and the neutral theory of population genetics 

As hinted at already, the 1960s were turbulent times for geneties—and for population geneties 
in partieular. Until then, the variation between individuals eould only be observed at the pheno- 
typieal level, and mueh of this was easily explained by seleetion: Stronger beaks eraek harder 
nuts; webbing eases swimming; fat pads protect against the cold. Then, in the 1960s, the first 
observations of variation at the molecular level became available - not yet via sequeneing, but 
via so-called restriction length polymorphsims (RFLP) of DNA or via gel electrophoresis of 
proteins. The resolution of these methods is lower than that of sequencing but, nevertheless, 
the variation was so mueh larger than expected on phenotypie grounds that researchers were 
shocked—and were puzzled about the question: Can this all be explained by selection? 

These eonsiderations were strongly influenced by the eoneept of the genetic load, eoined by 
Haldane IfTOl ; in partieular the eoneept of the substitutional load. This is the number of selective 
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deaths, that is, the number of individuals ‘killed’ by seleetion in the proeess of substituting one 
type by a fitter one. Put differently, the substitutional load (or cost of natural selection) is the 
number of exeess individuals that must be produeed in a population under seleetion. In 1968, 
Kimura [fTSlI eoneluded that, if a large fraetion of the observed variability is seleetive, the load is 
forbidding. This led to one of the most influential and eonfliet-prone hypotheses of evolutionary 
theory, namely, to the so-called neutral theory, which claims that the overwhelming proportion 
of the observed molecular variation is selectively neutral, that is, most mutations do not change 
fitness. 

In what follows, we look more closely into the concept of the substitutional load, and into 
Feller’s criticism of it. We restrict ourselves to the case of haploid populations (i.e., carrying 
only one copy of the genetic information) that reproduce asexually (Sections 1-5 in Feller’s 
paper). In Sections 6-9, he tackles additional complications that emerge in diploid individuals 
(with two copies of every gene), but the conceptual issues are more transparent in the haploid 
case. 

3.2.2 Absolute and relative frequencies in population genetics 

Feller considers a population of individuals that consists of our two types A and a, large enough 
to justify deterministic treatment. He assumes discrete generations where every A-individual 
leaves an average of /i offspring for the next generation, whereas every a-individual produces 
an average of /i' = /i(l — k) descendants, 0 < k < 1. The quantities /i and /i' are known as 
(absolute) fitnesses in population genetics. Each of the two subpopulations then grows (or 
decays) geometrically, 

( 5 ) Nn = llNn-X, K = 

so Nn = Ao/i" and A' = — k))", where and N'„ denote the size of the A- and a- 

subpopulations, respectively. 

Now population genetics is traditionally more concerned with relative frequencies of types 
than with absolute ones; the main reason is that relative frequencies are easier to measure. One 
therefore considers 

Nn K 

Nn+K^ Nn+N'„ 

Clearly, under ©, 

Fo + 9'o(l 

which is Feller’s Eq. (3.5). Obviously, the powers of /i cancel out, which is a strength and a 
weakness at the same time: On the one hand, this means that knowledge only of the ratio of the 
fitness values is required to predict the behaviour of the population. Actually, db]) holds more 
generally than the simple derivation may suggest: It continues to hold if © is replaced by 

( 7 ) Nn = ^iNn^lfiNn+K), K = ll{l - k)K^if{Nn + K) . 

Here / is a function that depends on the total population size only and acts on both types in the 
same way. In typical ecological models, one uses a monotonically decreasing function / with 
/(O) = 1 in order to describe how population size decreases the per capita offspring size due 
to competition. In particular, for suitable choices of /, both the population size and the relative 
frequencies will, in the long run, approach stationary values. 
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The downside of thinking in terms of relative fitnesses is that one loses sight of the absolute 
population sizes. The latter may lead to absurd conelusions, in partieular in eases where one or 
both subpopulations go extinet. This leads us to Feller’s eritieism of the genetie load. 

3.2.3 The substitutional load and Feller’s criticism 

Feller now reealls Haldane’s definition of the genetie load: In generation n, the mixed popula¬ 
tion has a loss of := kqn offspring relative to a population consisting of A individuals only, 
where d„ must be measured in units of the total population size of generation n. Over M gener¬ 
ations, Haldane takes D := as the total genetic load. Calculating this with reasonable 

parameters and as M ^ 0 °, he arrives at a representative value of 30 for the cost of substitution 
of one gene by a fitter one. 

Haldane’s definition of D makes sense only if population size remains constant over time, 
or if changes in population size are so small that they can safely be neglected. Haldane does 
not make this explicit; Feller has an eye on both possibilities. 

Feller first considers the case that the population size is not constant; rather, N„ and N'„ 
behave as in ([5]). Since /i = 1 and k < 1 is assumed, this means that Nn = Nq and —)■ 0 as 
n —)■ 00 , so the total population size decreases from Nn +N'^ to Nq. On an absolute scale, the loss 
of individuals due to selection is N'„ = kN',^ in generation n, and altogether 

{n^-n[) + {n[-n^) + ...=n;„ 

in agreement with the decrease of the total population size from Nq +Nq to Nq. In contrast, 
Haldane’s D, which neglects the size change, can give much larger values; in particular, it can 
be larger than the number of a individuals ever born. Obviously, this understanding of the load 
produces severe artifacts. 

Feller then discusses how Haldane’s argument may be ‘rescued’. One possibility is to 
keep population size constant by immigration from a reservoir population, in exactly the same 
proportions as in the current population under consideration. Then, Haldane’s D gives the 
correct answer (but it must be kept in mind that the cost of selection is then borne by the 
reservoir population). The second possibility is to assume an ecological model rather than 
geometric growth. Feller speaks of /i depending on population size; maybe as in our Eq. ([7]), 
where /i is replaced by llf{Nn +A7). But more general models are also possible; for example, 
each type may be affected by competition in a different way. The genetic load would then be 
identified with the decrease in the stationary population size due to the reduced reproduction 
rate of one type. But now there is a wealth of possible models, and the genetic load would 
depend on the details. Specifying these is a task Feller assigns to the biologists. 

3.2.4 Feller’s criticism in the general context 

With remarkable insight. Feller dissects a conceptual problem of his time: Load arguments can 
be inconsistent if they blindly rely on relative frequencies. His fellow researchers in biology 
do, however, not seem to have taken too much notice of his criticism. After all, as already 
mentioned in Section [3.2. 11 a year after Feller’s paper (and without citing it), Kimura did put 
forward his neutral theory of molecular evolution, to a large extent on the basis of load argu¬ 
ments ifTSll . More precisely, Kimura used an extension of Haldane’s argument. He assumed 
a sequence of numerous genetic loci (rather than Haldane’s and Feller’s single locus), each of 
which can be of a favourable or a less favourable type, with fitness assumed as multiplicative 
across loci. As a consequence, there is a multitude of possible genotypes; fitness differences 
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between individuals can become enormous; and the load (if calculated in Haldane’s manner) 
can become astronomical (Kimura and Ohta ifT^ give a value of D = 10^®). Here the mis¬ 
conception lies in the assumption of multiplicativity over loci, which is completely unrealistic, 
but was hardly questioned at that time. For the details, see the insightful presentation in [|9l 
Chapter 2.11]. 

Let us return to the original question that load arguments were supposed to answer: Can 
all the variation observed at the molecular level be explained by selection? Indeed, today, a 
large fraction of the molecular variation is considered selectively neutral or nearly so, although 
single mutations with spectacular selective effects are well known. But this insight is no longer 
built on load arguments - rather, the assumption of neutrality has proved extremely successful 
in describing patterns of genetic variation. 

Last but not least, it should be noted that there is a lot of truth in Feller’s general warning 
not to neglect population size in population genetics. Indeed, load arguments are not the only 
artifacts of this kind. Another example is the famous phenomenon of Muller’s ratchet, which 
describes the ad infinitum accumulation of deleterious mutations due to stochastic effects in 
finite populations of constant size. If described in terms of an ecologically more realistic (and 
conceptually more correct) model with variable population size, the accumulation does not 
continue forever. Rather, when fitness has declined below a threshold value, the population 
experiences a mutational meltdown, which ultimately leads to extinction (see [HI for a review). 
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